perm filename MACHR[4,KMC]1 blob
sn#006442 filedate 1972-10-17 generic text, type T, neo UTF8
00100 COLBY AND MORAVEC
00200
00300
00400 CONTEXT-SENSITIVE FEATURE RECOGNITION FOR COMPUTER UNDERSTANDING OF
00500 NATURAL LANGUAGE IN TELETYPED DIALOGUES
00600
00700
00800 WHY IS IT SO DIFFICULT FOR MACHINES TO UNDERSTAND NATURAL LANGUAGE?
00900 IT IS BECAUSE THEY DO NOT SIMULATE SUFFICIENTLY WHAT PEOPLE DO WHEN
01000 PEOPLE PROCESS LANGUAGE. MANY YEARS OF EXPERIENCE WITH COMPUTER
01100 SCIENCE AND LINGUISTIC APPROACHES HAVE TAUGHT US THE SCOPE AND
01200 LIMITATIONS OF SYNTACTICAL, SEMANTIC AND CONCEPTUAL PARSING.[THORNE &
01300 BRATLEY] [SIMMONS] [SCHANK][WILKS][WOODS][WINOGRAD]. WHILE CONVENTIONAL
01400 PARSERS PERFORM SATISFACTORILY WITH EDITED TEXT SENTENCES OR WITH
01500 EXPRESSIONS LIMITED TO A TOY WORLD, THEY ARE INADEQUATE FOR EVERYDAY
01600 LANGUAUGE BEHAVIOR SUCH AS TAKES PLACE BETWEEN TWO PEOPLE WHEN THEY
01700 CONVERSE. IN AN UNDERSTANDBLY RATIONALISTIC QUEST FOR CERTAINTY AND
01800 ATTRACTED BY AN ANALOGY FROM THE PROOF THEORY OF LOGICIANS IN WHICH
01900 PROVABILITY IMPLIED COMPUTABILITY, COMPUTATIONAL LINGUISTS HOPED TO
02000 DEVELOP CONTEXT-FREE FORMALISMS FOR NATURAL LANGUAGE. BUT THE HOPE
02100 HAS NOT BEEN REALIZED AND PERHAPS IN PRINCIPLE CANNOT BE. (IT IS
02200 DIFFICULT TO FORMALIZE SOMETHING WHICH CAN HARDLY BE FORMULATED).
02300 IN THEIR DIALOGUES HUMANS ARE NEVER CONTEXT-FREE LINGUISTICALLY OR
02400 CONCEPTUALLY. THE MAIN PROBLEM IS HOW TO MODEL THIS CONTEXT-SENSITIVITY.
02500
02600 LINGUISTIC PARSERS USE MORPHEMIC ANALYSIS TO OBTAIN WORD-ROOTS,
02700 PARTS-OF-SPEECH ASSIGNMENTS AND DICTIONARIES CONTAINING MULTIPLE
02800 WORD-SENSES ALONG WITH SEMANTIC FEATURES WHICH RESTRICT WORD
02900 COMBINATIONS. THEY PERFORM A WORD-BY-WORD ANALYSIS OF EVERY WORD, VALIANTLY
03000 DISAMBIGUATING AT EACH STEP IN AN ATTEMPT TO CONSTRUCT A MEANINGFUL
03100 INTERPRETATION. WHILE SOPHISTICATED COMPUTATIONALLY, SUCH A PARSER
03200 BECOMES PARALYZED BY QUITE ORDINARY CONVERSATION. IN EVERYDAY
03300 DISCOURSE PEOPLE SPEAK COLLOQUIALLY AND IDIOMATICALLY USING ALL SORTS OF PAT
03400 PHRASES (`YOU SAID IT'), SLANG (`LETS RAP') AND CLICHES (`THATS THE
03500 WAY IT GOES'). THEY ARE CRYPTIC AND ELLIPTIC. THEY LACE THEIR
03600 UTTERANCES WITH MUMBLES (`MM-AH'), FUZZ (`WELL NOW LETS SEE') AND
03700 FRAGMENTS(`REALLY').THEY CONVEY THEIR INTENTIONS AND IDEAS IN BOTH
03800 IDIOSYNCRATIC AND METAPHORICAL WAYS, BLITHELY VIOLATING RULES OF
03900 'CORRECT' GRAMMAR AND SYNTAX. GIVEN THESE DIFFICULTIES, HOW IS
04000 IT THAT PEOPLE CARRY ON CONVERSATIONS EASILY MOST OF THE TIME WHILE
04100 MACHINES HAVE FOUND IT EXTREMELY DIFFICULT TO CONTINUE TO MAKE
04200 CONCEPTUALLY APPROPRIATE REPLIES WHICH COMMUNICATE
04300 UNDERSTANDING. THE OPERATIONS OF CURRENT PARSERS HAVE BEEN
04400 THOUGHTFULLY REVIEWED BY WINOGRAD [ ].
04500
04600
04700 IT SEEMS THAT PEOPLE 'GET THE MESSAGE' WITHOUT ANALYZING EVERY SINGLE
04800 WORD IN THE INPUT. PEOPLE MAKE INDIVIDUALISTIC SELECTIONS FROM
04900 HIGHLY REDUNDANT AND REPETITIOUS COMMUNICATIONS. THESE SELECTIVE
05000 OPERATIONS PRODUCE A TRANSFORMATION OF THE INPUT BY DESTROYING AND
05100 EVEN DISTORTING INFORMATION. IN SPEED READING, FOR EXAMPLE, ONLY A
05200 SMALL PERCENTAGE OF CONTENTIVE WORDS ON EACH PAGE NEED BE LOOKED AT.
05300 THESE WORDS SOMEHOW RESONATE WITH THE READERS RELEVANT
05400 CONCEPTUAL-INFERENTIAL STRUCTURE WHOSE PROCESSES ENABLE HIM TO
05500 'UNDERSTAND' NOT SIMPLY THE LANGUAGE BUT ALL SORTS OF UNMENTIONED ASPECTS ABOUT
05600 THE SITUATIONS AND EVENTS BEING REFERRED TO BY THE LANGUAGE. IN
05700 WRITTEN TEXTS 5/6 OF THE INPUT CAN BE DISTORTED OR DELETED AND THE
05800 INTENDED MESSAGE CAN STILL SUCCESSFULLY BE EXTRACTED. SPOKEN
05900 CONVERSATIONS IN ENGLISH ARE KNOWN TO BE AT LEAST 50% REDUNDANT. HALF
06000 THE WORDS CAN BE GARBLED AND LISTENERS NONETHELESS GET THE GIST OR
06100 DRIFT OF WHAT IS BEING SAID. (GIVE FURTHER EXPERIMENTAL EVIDENCE
06200 HERE)
06300
06400 TO APPROXIMATE SUCH HUMAN PERFORMANCES AN APPROACH DIFFERENT FROM
06500 THAT OF THE USUAL LINGUISTIC PARSER IS REQUIRED. THIS
06600 ALTERNATE APPROACH SHOULD INCORPORATE KNOWLEDGE GAINED FROM WORK WITH
06700 PARSERS BUT SHOULD UTILIZE PRIMARILY CONCEPTUAL RATHER THAN
06800 GRAMMATICAL FEATURES. PARSERS REPRESENT COMPLEX AND REFINED
06900 ALGORITHMS. WHILE ON ONE HAND THEY SUBJECT A SENTENCE TO A DETAILED
07000 AND SOMETIMES OVERKILLING ANALYSIS, ON THE OTHER THEY ARE FINICKY AND
07100 OVERSENSITIVE. FOR EXAMPLE, A LINGUISTIC PARSER SIMPLY HALTS IF A
07200 WORD IN THE INPUT SENTENCE IS NOT PRESENT IN ITS DICTIONARY.
07300 UNGRAMMATICAL EXPRESSIONS, FOR EXAMPLE DOUBLE PREPOSITIONS (`DO YOU WANT TO GET OUT OF FROM THE
07400 HOSPITAL?') ARE QUITE CONFUSING TO THEM. ON INTUITIVE GROUNDS IT
07500 IS HARDLY CREDIBLE THAT PARSERS MODEL THE MECHANISMS PEOPLE USE IN
07600 PROCESSING LANGUAGE. AS CHOMSKY[ ] HAS REMARKED, `WE NOTED AT THE
07700 OUTSET THAT PERFORMANCE AND COMPETENCE MUST BE SHARPLY DISTINGUIHED
07800 IF EITHER IS TO BE STUDIED SUCCESSFULLY. WE HAVE NOW DESCRIBED A
07900 CERTAIN MODEL OF COMPETENCE. IT WOULD BE TEMPTING, BUT QUITE
08000 ABSURD, TO REGARD IT AS A MODEL OF PERFORMANCE AS WELL. THUS WE MIGHT
08100 PROPOSE THAT TO PRODUCE A SENTENCE THE SPEAKER GOES THROUGH THE
08200 SUCCESSIVE STEPS OF CONSTRUCTING A BASE-DERIVATION, LINE BY LINE FROM
08300 THE INITIAL SYMBOL S, THEN INSERTING LEXICAL ITEMS AND APPLYING
08400 GRAMMATICAL TRANSFORMATIONS TO FORM A SURFACE STRUCTURE, AND FINALLY
08500 APPLYING THE PHONOLOGICAL RULES IN THEIR GIVEN ORDER, IN ACCORDANCE
08600 WITH THE CYCLIC PRINCIPLE DISCUSSED ABOVE. THERE IS NOT THE SLIGHTEST
08700 JUSTIFICATION FOR ANY SUCH ASSUMPTION.' IT IS CLEAR FROM THESE
08800 REMARKS THAT THE TRANSFORMATIONAL APPROACH HAS BEEN CONCERNED WITH
08900 PRODUCTION RATHER THAN INTERPRETATION OF SENTENCES AND THAT IT IS NOT
09000 ORIENTED TOWARDS HUMAN PERFORMANCE BUT TOWARDS AN IDEALIZED GRAMMAR
09100 OF COMPETENCE.
09200
09300 EARLY ATTEMPTS TO DEVELOP A FEATURE-RECOGNITION APPROACH USING
09400 SPECIAL-PURPOSE HEURISTICS ARE DESCRIBED IN [ ],[ ]. THE LIMITATIONS
09500 OF THESE ATTEMPTS ARE WELL KNOWN TO WORKERS IN ARTIFICIAL
09600 INTELLIGENCE. SUCH PRIMITIVE CONTEXT-RESTRICTED PROGRAMS GRASP A
09700 TOPIC WELL ENOUGH BUT TOO OFTEN DO NOT UNDERSTAND OF WHAT IS
09800 BEING SAID ABOUT THE TOPIC. THIS SHORTCOMING IS BOTH LINGUISTIC
09900 AND CONCEPTUAL.BECAUSE THE FEATURE- RECOGNITION OF SUCH PROGRAMS IS SIMPLISTIC AND THE
10000 PROGRAMS LACK A RICH CONCEPTUAL STRUCTURE INTO WHICH THE PATTERN
10100 ABSTRACTED FROM THE INPUT CAN BE MATCHED FOR FURTHER INFERENCING,
10200 THE MAN-MACHINE CONVERSATIONS SOON BECOME
10300 IMPOVERISHED AND BORING. WINOGRAD`S PROGRAM ,WHILE LIMITED TO A FEW
10400 OBJECTS AND RELATIONS IN A TOY ROBOTIC WORLD,REPRESENTED A GREAT
10500 IMPROVEMENT IN THE FEATURE-RECOGNITION APPROACH. HOWEVER MANY OF HIS
10600 FEATURES,SUCH AS DETERMINERS AND NOUN GROUPS, WERE GRAMMATICALLY
10700 RATHER THAN CONCEPTUALLY ORIENTED. ANOTHER FEATURE-RECOGNITUION APPROACH IS
10800 THAT OF WILKS[ ] WORKING IN THE AREA OF MACHINE TRANSLATION. HIS
10900 ALGORITHM CONSTRUCTS A PATTERN FROM ENGLISH TEXT INPUT WHICH IS
11000 MATCHED AGAINST TEMPLATES IN AN INTERLINGUAL DATA BASE FROM WHICH,IN
11100 TURN, FRENCH OUTPUT IS GENERATED WITHOUT USING A GENERATIVE GRAMMAR.
11200
11300 IN THE COURSE OF CONSTRUCTING A COMPUTER SIMULATION OF PARANOIA WE
11400 WERE FACED WITH THE PROBLEM OF DEALING WITH NATURAL LANGUAGE AS IT IS
11500 USED IN THE DOCTOR-PATIENT SITUATION OF A PSYCHIATRIC INTERVIEW.THIS
11600 DOMAIN OF DISCOURSE ADMITTEDLY CONTAINS MANY STEREOTYPES (`WHAT BROUGHT
11700 YOU TO THE HOSPITAL?') AND IS CONSTRAINED IN TOPICS (NEWTON`S LAWS
11800 ARE RARELY DISCUSSED). BUT IT IS RICH ENOUGH IN VERBAL BEHAVIOR TO BE A CHALLENGE TO A
11900 LANGUAGE UNDERSTANDING ALGORITHM SINCE A GREAT VARIETY OF HUMAN RELATIONS
12000 ARE DISCUSSED IN THIS DOMAIN INCLUDING THAT WHICH DEVELOPS BETWEEN
12100 THE INTERVIEW PARTICIPANTS. THE JUDGEMENT OF 'PARANOIA' IS MADE BY
12200 PSYCHIATRISTS RELYING MAINLY ON THE VERBAL BEHAVIOR OF THE
12300 INTERVIEWED PATIENT. IF A PARANOID MODEL IS TO EXHIBIT PARANOID
12400 BEHAVIOR IN A PSYCHIATRIC INTERVIEW, IT MUST BE CAPABLE OF HANDLING
12500 DIALOGUES TYPICAL OF THE DOCTOR-PATIENT CONTEXT. SINCE THE MODEL
12600 CAN COMMUNICATE ONLY THROUGH TELETYPED MESSAGES,THE VIS-A-VIS ASPECTS
12700 OF THE USUAL PSYCHIATRIC INTERVIEW ARE ABSENT. THUS THE MODEL SHOULD
12800 BE ABLE TO DEAL WITH TYPEWRITTEN NATURAL LANGUAGE INPUT AND TO OUTPUT
12900 REPLIES WHICH ARE INDICATIVE OF AN UNDERLYING PARANOID THOUGHT
13000 PROCESS.
13100
13200 IN A PSYCHIATRIC INTERVIEW THERE IS ALWAYS A WHO SAYING SOMETHING TO
13300 A WHOM WITH DEFINITE INTENTIONS AND EXPECTATIONS. THERE ARE TWO SITUATIONS
13400 TO BE TAKEN INTO ACCOUNT, THE ONE BEING TALKED ABOUT AND THE ONE THE PARTICIPANTS ARE IN.
13500 SOMETIMES THE LATTER BECOMES THE FORMER. AS WEIZENBAUM [ ] HAS
13600 EMPHASIZED FOR COMPUTER SCIENTISTS, DIALOGUES HAVE PURPOSES AND
13700 MACHINES MUST RECOGNIZE THIS FACT. THE DOCTOR'S PURPOSE IS TO GATHER
13800 CERTAIN KINDS OF INFORMATION WHILE THE PATIENT'S PURPOSE IS TO GIVE
13900 INFORMATION AND GET HELP.THAT IS, A JOB IS TO BE DONE. OUR WORKING HYPOTHESIS IS
14000 THAT EACH PARTICIPANT IN THE DIALOGUE UNDERSTANDS THE OTHER BY
14100 MATCHING SELECTED SIGNIFICANT FEATURES IN THE INPUT AGAINST STORED
14200 CONCEPTUAL PATTERNS WHICH CONTAIN INFORMATION ABOUT THE SITUATION OR
14300 EVENT BEING DESCRIBED LINGUISTICALLY. THIS UNDERSTANDING IS
14400 COMMUNICATED RECIPROCALLY BY LINGUISTIC RESPONSES JUDGED APPROPRIATE
14500 TO THE INTENTIONS AND EXPECTATIONS OF THE PARTICIPANTS.IN THIS PAPER WE SHALL DESCRIBE
14600 ONLY THE CONTEXT-SENSITIVE FEATURE-RECOGNITION PROCESSES USED TO
14700 EXTRACT A PATTERN FROM NATURAL LANGUAGE INPUT.IN A LATER
14800 COMMUNICATION WE SHALL DESCRIBE THE INFERENTIAL PROCESSES CARRIED OUT
14900 AT THE CONCEPTUAL LEVEL ONCE THE `PARADIGMATIC' PATTERN HAS BEEN RECEIVED FROM THE
15000 FEATURE-RECOGNITION PROCESSES.
15100
15200
15300 (HANS WRITES DESCRIPTION OF HIS FEATURE RECOGNIZER)